ארכיטקטורת יחידת עיבוד מרכזי ת

Size: px

Start display at page:

Download "ארכיטקטורת יחידת עיבוד מרכזי ת"

Piers Anthony
5 years ago
Views:

1 ארכיטקטורת יחידת עיבוד מרכזי ת ( ) תשס"ג סמסטר א' July 2, 2008 Hugo Guterman Arch. CPU L8 Cache Intr. 1/77

2 Memory Hierarchy Arch. CPU L8 Cache Intr. 2/77

3 Why hierarchy works The Principle of Locality: Program access a relatively small portion of the address space at any instant of time. Probability of reference 0 Address Space 2^n - 1 Arch. CPU L8 Cache Intr. 3/77

4 Memory Hierarchy: How it works? Arch. CPU L8 Cache Intr. 4/77

5 Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) Hit Rate: the fraction of memory access found in the upper level Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty To Processor Upper Level Memory Lower Level Memory From Processor Blk X Blk Y Arch. CPU L8 Cache Intr. 5/77

6 Cache Memory Arch. CPU L8 Cache Intr. 6/77

7 Basic Cache Design Arch. CPU L8 Cache Intr. 7/77

8 Cache Example (1) Arch. CPU L8 Cache Intr. 8/77

9 Cache Example (2) Arch. CPU L8 Cache Intr. 9/77

10 Cache Example (3) Arch. CPU L8 Cache Intr. 10/77

11 Cache Example (4) Arch. CPU L8 Cache Intr. 11/77

12 Cache Example (5) Arch. CPU L8 Cache Intr. 12/77

13 Cache Example (6) Arch. CPU L8 Cache Intr. 13/77

14 Cache Example (7) Arch. CPU L8 Cache Intr. 14/77

15 Cache Example (8) Arch. CPU L8 Cache Intr. 15/77

16 Cache Example (9) Arch. CPU L8 Cache Intr. 16/77

17 Cache Example (10) Arch. CPU L8 Cache Intr. 17/77

18 Cache Example (11) Arch. CPU L8 Cache Intr. 18/77

19 Cache Example (12) Arch. CPU L8 Cache Intr. 19/77

20 Cache Example (13) Arch. CPU L8 Cache Intr. 20/77

21 Cache Example (14) Arch. CPU L8 Cache Intr. 21/77

22 Cache Example (15) Arch. CPU L8 Cache Intr. 22/77

23 Compare Cache with non Cache Arch. CPU L8 Cache Intr. 23/77

24 The basic s of Cache Arch. CPU L8 Cache Intr. 24/77

25 Q1: Block Placement Arch. CPU L8 Cache Intr. 25/77

26 Q2: Block Identification Arch. CPU L8 Cache Intr. 26/77

27 Direct-mapped cached example Arch. CPU L8 Cache Intr. 27/77

28 Fully-Associative Cache Arch. CPU L8 Cache Intr. 28/77

29 Q3: Block Replacement Arch. CPU L8 Cache Intr. 29/77

30 Simplest Cache: Direct Mapped Memory Address A B C D E F Arch. CPU L8 Cache Intr. 30/77 Memory 4 Byte Direct Mapped Cache Cache Index Location 0 can be occupied by data from: Memory location 0, 4, 8,... etc. In general: any memory location whose 2 LSBs of the address are 0s Address<1:0> => cache index Which one should we place in the cache? How can we tell which one is in the cache?

31 1 KB Direct Mapped Cache, 32B blocks For a 2 ** N byte cache: The uppermost (32 - N) bits are always the Cache Tag The lowest M bits are the Byte Select (Block Size = 2 ** M) Cache Tag Example: 0x50 Cache Index Byte Select Stored as part of the cache state Ex: 0x01 Ex: 0x00 Valid Bit Cache Tag Cache Data Byte 31 : Byte 1 Byte 0 0 0x50 Byte 63 : Byte 33 Byte : : : Byte 1023 : Byte Arch. CPU L8 Cache Intr. 31/77

32 Two-way Set Associative Cache Valid N-way set associative: N entries for each Cache Index N direct mapped caches operates in parallel (N typically 2 to 4) Example: Two-way set associative cache Cache Index selects a set from the cache The two tags in the set are compared in parallel Data is selected based on the tag result Cache Tag Cache Data Cache Block 0 : : : Cache Index Cache Data Cache Block 0 : Cache Tag Valid : : Adr Tag Compare Sel1 1 Mux 0 Sel0 Compare OR Arch. CPU L8 Cache Intr. 32/77 Hit Cache Block

33 Valid Disadvantage of Set Associative Cache N-way Set Associative Cache v. Direct Mapped Cache: N comparators vs. 1 Extra MUX delay for the data Data comes AFTER Hit/Miss In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: Possible to assume a hit and continue. Recover later if miss. Cache Tag Cache Data Cache Block 0 : : : Cache Index Cache Data Cache Block 0 : Cache Tag Valid : : Adr Tag Compare Sel1 1 Mux 0 Sel0 Compare OR Arch. CPU L8 Cache Intr. 33/77 Hit Cache Block

34 Q3: Block Replacement Arch. CPU L8 Cache Intr. 34/77

35 Q3: Which block should be replaced on a miss? Easy for Direct Mapped Set Associative or Fully Associative: Random LRU (Least Recently Used) Associativity: 2-way 4-way 8-way Size LRURandom LRURandom LRURandom 16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0% 64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5% 256 KB 1.15% 1.17%1.13% 1.13% 1.12% 1.12% Arch. CPU L8 Cache Intr. 35/77

36 Q4: Write Strategy Arch. CPU L8 Cache Intr. 36/77

37 What happens on a write? Write through The information is written to both the block in the cache and to the block in the lower-level memory. Write back The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty? Pros and Cons of each? WT: read misses cannot result in writes WB: no repeated writes to same location WT always combined with write buffers so that don t wait for lower level memory Arch. CPU L8 Cache Intr. 37/77

38 Write Buffer for Write Through Processor Cache DRAM Write Buffer A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory Write buffer is just a FIFO: Typical number of entries: 4 Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Memory system designer s nightmare: Store frequency (w.r.t. time) -> 1 / DRAM write cycle Write buffer saturation Arch. CPU L8 Cache Intr. 38/77

39 Q4: Write Strategy (write allocation) Arch. CPU L8 Cache Intr. 39/77

40 Summary of Cache Questions Arch. CPU L8 Cache Intr. 40/77

41 Metric for Cache Performance: Miss Rate Arch. CPU L8 Cache Intr. 41/77

42 Metric for Cache Performance: AMAT Arch. CPU L8 Cache Intr. 42/77

43 Average Memory Access Time Example Arch. CPU L8 Cache Intr. 43/77

44 Example Arch. CPU L8 Cache Intr. 44/77

45 Miss-Rate Arch. CPU L8 Cache Intr. 45/77

46 Miss-Rate Arch. CPU L8 Cache Intr. 46/77

47 Example Arch. CPU L8 Cache Intr. 47/77

48 Example Arch. CPU L8 Cache Intr. 48/77

49 Example Arch. CPU L8 Cache Intr. 49/77

50 Cache Performance Design Parameters Arch. CPU L8 Cache Intr. 50/77

51 Reducing the Cache Miss Rate Arch. CPU L8 Cache Intr. 51/77

52 Sources of Cache Misses Arch. CPU L8 Cache Intr. 52/77

53 Sources of Cache Misses Arch. CPU L8 Cache Intr. 53/77

54 Compulsory, Conflict and Capacity Miss Rates Arch. CPU L8 Cache Intr. 54/77

55 Techniques for reducing Miss Rates Arch. CPU L8 Cache Intr. 55/77

56 Techniques for reducing Miss Rates: Larger Blocks Arch. CPU L8 Cache Intr. 56/77

57 Miss Rate vs. Line Size Arch. CPU L8 Cache Intr. 57/77

58 Average Memory Access Time Arch. CPU L8 Cache Intr. 58/77

59 Higher Associativity Arch. CPU L8 Cache Intr. 59/77

60 Average Memory Access Time Arch. CPU L8 Cache Intr. 60/77

61 Example 8KB 2-Way Set-associative memory Arch. CPU L8 Cache Intr. 61/77

62 Victim Caches Arch. CPU L8 Cache Intr. 62/77

63 Hardware Pre-Fetching Arch. CPU L8 Cache Intr. 63/77

64 Compiler Prefetch Arch. CPU L8 Cache Intr. 64/77

65 Compiler Prefetching Arch. CPU L8 Cache Intr. 65/77

66 Techniques for reducing miss penalties Arch. CPU L8 Cache Intr. 66/77

67 Give reads priority over writes Arch. CPU L8 Cache Intr. 67/77

68 Sub-block placement Arch. CPU L8 Cache Intr. 68/77

69 Early restart & Critical word first Arch. CPU L8 Cache Intr. 69/77

70 2 nd Level Caches Arch. CPU L8 Cache Intr. 70/77

71 2 nd Level Cache Arch. CPU L8 Cache Intr. 71/77

72 Reducing Hit Time Arch. CPU L8 Cache Intr. 72/77

73 Cycle time vs. Access time Arch. CPU L8 Cache Intr. 73/77

74 Simple Main Memory Arch. CPU L8 Cache Intr. 74/77

75 Wider Main Memory Arch. CPU L8 Cache Intr. 75/77

76 Interleave Memory Arch. CPU L8 Cache Intr. 76/77

77 Summary: Levels of Memory Hierarchy Arch. CPU L8 Cache Intr. 77/77

COSC 6385 Computer Architecture - Memory Hierarchies (I)

COSC 6385 Computer Architecture - Memory Hierarchies (I) Edgar Gabriel Spring 2018 Some slides are based on a lecture by David Culler, University of California, Berkley http//www.eecs.berkeley.edu/~culler/courses/cs252-s05